NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks

Neyshabur, Behnam; Li, Zhiyuan; Bhojanapalli, Srinadh; LeCun, Yann; Srebro, Nathan (January 2019, International Conference on Learning Representations (ICLR))

Full Text Available
Geometry of Optimization and Implicit Regularization in Deep Learning

Neyshabur, Behnam; Tomioka, Ryota; Salakhutdinov, Ruslan; Srebro, Nathan (May 2017, arXiv.org)

We argue that the optimization plays a crucial role in generalization of deep learning models through implicit regularization. We do this by demonstrating that generalization ability is not controlled by network size but rather by some other implicit control. We then demonstrate how changing the empirical optimization procedure can improve generalization, even if actual optimization quality is not affected. We do so by studying the geometry of the parameter space of deep networks, and devising an optimization algorithm attuned to this geometry.
more » « less
Full Text Available
Implicit Regularization in Matrix Factorization

Gunasekar, Suriya; Woodworth, Blake; Bhojanapalli, Srinadh; Neyshabur, Behnam; Srebro, Nathan (May 2017, arXiv.org)

We study implicit regularization when optimizing an underdetermined quadratic objective over a matrix X with gradient descent on a factorization of X. We conjecture and provide empirical and theoretical evidence that with small enough step sizes and initialization close enough to the origin, gradient descent on a full dimensional factorization converges to the minimum nuclear norm solution.
more » « less
Full Text Available
Global optimality of local search for low rank matrix recovery

Bhojanapalli, Srinadh; Neyshabur, Behnam; Srebro, Nathan (May 2016, arXiv.org)

We show that there are no spurious local minima in the non-convex factorized parametrization of low-rank matrix recovery from incoherent linear measurements. With noisy measurements we show all local minima are very close to a global optimum. Together with a curvature bound at saddle points, this yields a polynomial time global convergence guarantee for stochastic gradient descent from random initialization.
more » « less
Full Text Available
Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

Neyshabur, Behnam; Wu, Yuhuai; Salakhutdinov, Ruslan; Srebro, Nathan (May 2016, arXiv.org)

We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations. On several datasets that require capturing long-term dependency structure, we show that path-SGD can significantly improve trainability of ReLU RNNs compared to RNNs trained with SGD, even with various recently suggested initialization schemes.
more » « less
Full Text Available
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Srivastava, Aarohi; Rastogi, Abhinav; Rao, Abhishek; Shoeb, Abu Awal; Abid, Abubakar; Fisch, Adam; Brown, Adam R.; Santoro, Adam; Gupta, Aditya; Garriga-Alonso, Adri; et al (January 2023, Transactions on machine learning research)

Full Text Available

Search for: All records